Wei‐Bang Chen and Chengcui Zhang – 2 Revision Title: A Hybrid Framework for Protein Sequences Clustering and Classification Using Signature Motif Information
نویسندگان
چکیده
In this paper, we propose an unsupervised hybrid framework for protein sequence clustering and classification which incorporates protein structural motif information. The proposed framework consists of three stages: protein structural motif scan, hybrid clustering, and sequence classification. The incorporation of protein structural motif detected by ScanProsite service provides a better measurement in calculating the sequence similarity. The proposed two-phase hybrid clustering approach combines the strengths of the hierarchical and the partition clustering. Phase I adopts the hierarchical agglomerative clustering to pre-cluster multi-aligned sequences. Phase II performs the partition clustering which initiates its partition based on the result from Phase I and uses profile Hidden Markov Models (HMMs) to represent clusters. The profile HMMs are then stored in the database for unknown sequences classification, which is done by finding the best alignment of a sequence to each existing profile HMM. Our experiments demonstrate the effectiveness and the efficiency of the proposed framework for biological sequence clustering and classification. Wei‐Bang Chen and Chengcui Zhang – 2 Revision
منابع مشابه
A Multimodal Data Mining Framework for Revealing Common Sources of Spam Images
This paper proposes a multimodal framework that clusters spam images so that ones from the same spam source/cluster are grouped together. By identifying the common sources of spam images, we can provide evidence in tracking spam gangs. For this purpose, text recognition and visual feature extraction are performed. Subsequently, a two-level clustering method is applied where images with visually...
متن کاملAn Image Clustering and Feedback-based Retrieval Framework
Most existing object-based image retrieval systems are based on single object matching, with its main limitation being that one individual image region (object) can hardly represent the user’s retrieval target, especially when more than one object of interest is involved in the retrieval. Integrated Region Matching (IRM) has been used to improve the retrieval accuracy by evaluating the overall ...
متن کاملA Supervised Machine Learning Approach of Extracting and Ranking Published Papers Describing Coexpression Relationships among Genes
In this chapter, we describe a framework to extract information about coexpression relationships among genes from published literature using a supervised machine learning approach, and later rank those papers to provide users with a complete specialized information retrieval system. We use Dynamic Conditional Random Fields (DCRFs), for training our classification model. Our approach is based on...
متن کاملAuthorship Detection and Encoding for eBay Images
This paper describes a framework to detect authorship of eBay images which contains three modules editing style summarization, classification and multi-account linking detection. For editing style summarization, three approaches, namely the edge-based approach, the color-based approach, and the color probability approach, are proposed to encode the common patterns inside a group of images with ...
متن کاملDevelopment of an Efficient Hybrid Method for Motif Discovery in DNA Sequences
This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed b...
متن کامل